Four types of context for automatic spelling correction

نویسنده

  • Michael Flor
چکیده

This paper presents an investigation on using four types of contextual information for improving the accuracy of automatic correction of single-token non-word misspellings. The task is framed as contextually-informed re-ranking of correction candidates. Immediate local context is captured by word n-grams statistics from a Web-scale language model. The second approach measures how well a candidate correction fits in the semantic fabric of the local lexical neighborhood, using a very large Distributional Semantic Model. In the third approach, recognizing a misspelling as an instance of a recurring word can be useful for reranking. The fourth approach looks at context beyond the text itself. If the approximate topic can be known in advance, spelling correction can be biased towards the topic. Effectiveness of proposed methods is demonstrated with an annotated corpus of 3,000 student essays from international high-stakes English language assessments. The paper also describes an implemented system that achieves high accuracy on this task. RÉSUMÉ. Cet article présente une enquête sur l’utilisation de quatre types d’informations contextuelles pour améliorer la précision de la correction automatique de fautes d’orthographe de mots seuls. La tâche est présentée comme un reclassement contextuellement informé. Le contexte local immédiat, capturé par statistique de mot n-grammes est modélisé à partir d’un modèle de langage à l’échelle du Web. La deuxième méthode consiste à mesurer à quel point une correction s’inscrit dans le tissu sémantique local, en utilisant un très grand modèle sémantique distributionnel. La troisième approche reconnaissant une faute d’orthographe comme une instance d’un mot récurrent peut être utile pour le reclassement. La quatrième approche s’attache au contexte au-delà du texte lui-même. Si le sujet approximatif peut être connu à l’avance, la correction orthographique peut être biaisée par rapport au sujet. L’efficacité des méthodes proposées est démontrée avec un corpus annoté de 3 000 travaux d’étudiants des évaluations internationales de langue anglaise. Le document décrit également un système mis en place qui permet d’obtenir une grande précision sur cette tâche.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه یک رتبه‌بند برای خطایاب معنایی با استفاده از ویژگی‌های حساس به متن

Nowadays, a large volume of documents is generated daily. These documents generated by different persons, thus, the documents contain spelling errors. These spelling errors cause quality of the documents are decrease. Therefore, existence of automatic writing assistance tools such as spell checker/corrector can help to improve their quality. Context-sensitive are misspelled words that have been...

متن کامل

Spelling and Grammar Correction for Danish in SCARRIE

This paper reports on work carried out to develop a spelling and grammar corrector for Dan-ish, addressing in particular the issue of how a form of shallow parsing is combined with error detection and correction for the treatment of context-dependent spelling errors. The syntactic grammar for Danish used by the system has been developed with the aim of dealing with the most frequent error types...

متن کامل

Towards Context-Dependent Phonetic Spelling Error Correction in Children's Freely Composed Text for Diagnostic and Pedagogical Purposes

Reading and writing are core competencies of any society. In Germany, international and national comparative studies such as PISA or IGLU have shown that around 25% of German school children do not reach the minimal competence level necessary to function effectively in society by the age of 15. Automized diagnosis and spelling tutoring of children can play an important role in raising their ort...

متن کامل

Design and implementation of Persian spelling detection and correction system based on Semantic

Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors.  Also developing Persian tools will provide Persian progr...

متن کامل

Arabic Spelling Correction using Supervised Learning

In this work, we address the problem of spelling correction in the Arabic language utilizing the new corpus provided by QALB (Qatar Arabic Language Bank) project which is an annotated corpus of sentences with errors and their corrections. The corpus contains edit, add before, split, merge, add after, move and other error types. We are concerned with the first four error types as they contribute...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • TAL

دوره 53  شماره 

صفحات  -

تاریخ انتشار 2012